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Abstract. A perceptron with N random weights can store of the order of N patterns 
by removing a fraction of the weights without changing their strengths. The critical 
storage capacity as a function of the concentration of the remaining bonds for random 
outputs and for outputs given by a teacher perceptron is calculated. A simple Hebb- 
like dilution algorithm is presented which in the teacher case reaches the optimal 
generalization ability. 



PACS numbers: 07.05.Mh, 05.20.-y, 05.90.+m, 87.10.+e 
1. Introduction 

Neural networks are able to learn from examples and to find an unknown rule. Storage 
capacities and generalization abilities have been calculated for a variety of network 
architectures within the framework of statistical mechanics |rj. Special interest has 
been devoted to diluted networks, where only a fraction of the neurons is connected 
H, [|. The most popular example of a diluted network which appears in Nature is 
the human brain. Every neuron is connected to roughly 10000 others, whereas their 
total number is about 10 7 times larger. Theoretical studies indeed showed, that the 
effective storage capacity per neuron in diluted systems can be substantially larger than 
in undiluted networks [|J. 

So far, dilution of synapses has been considered in addition to the usual dynamical 
modification of the bonds, which takes place in the learning phase J|. Motivated by 
biological observations H, which indicate that at early stages of development of the 
brain, synapses are removed if their strength is not appropriate, we address the question 
whether it is possible to store patterns in a network with randomly chosen coupling 
strengths only by removing a fraction of these bonds without changing their strength. 
This is a nontrivial task, since given a specific set of patterns it is a priori not clear 
which of the bonds have to be removed. Previous studies |J have considered a learning 
algorithm which removes weights that are frustrated in at least one of the patterns. 
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However, this simple method removes too many weights, hence the storage capacity 
increases with log N only. In this paper we show, that it is possible to learn of the order 
of N patterns perfectly and to generalize by removing a fraction of the weights. We 
focus on the perceptron, as it is the simplest network for which a tractable calculation 
is feasible. The same procedure however should be applicable to general network classes 
such as multi-layer perceptrons or attractor networks. 

The paper is organized as follows: In section two we introduce the model and 
calculate the critical storage capacity for a random input-output relation following 
the standard statistical mechanics approach established by Gardner ||. Section 
three examines the properties of a perceptron which learns from an undiluted 
teacher perceptron. A simple Hebb-like dilution algorithm that reaches the optimal 
generalization ability is presented in section four. In the last section we close with a 
summary. 



2. The model for random input— output relations 

The diluted perceptron classifies an input pattern £ M according to: 

.' = sgn(-^|>W). (!) 

Where Jj are the components of the weight vector drawn at random from a distribution 
P(Ji). The Q are binary decision variables which can take the values or 1 and determine 
whether the ith coupling Jj has been removed or is kept, respectively. For a given set 
of input-output pairs s M }^ =1 the classification is correct if: 

1 N 

>0 > V/i=l,...,p. (2) 



We suppose that the inputs £f are drawn at random from a distribution with zero mean 
and unit variance and we choose s M = ±1 with equal probability and independently of 
the inputs. The concentration of remaining bonds is defined to be c = N~ l Y^Li c % an d 
thus lies between and 1. 

We are interested whether the maximum number of patterns p m ax, which are 
correctly classified, can be of the order of their input dimension N, resulting in a 
critical storage capacity of a c = p m ax/N, for a fixed value of the concentration c of 
remaining bonds in the thermodynamic limit (N — > oo). Let us first consider the 
extreme cases. For c = all bonds have been removed and no classification is possible, 
so that a c (c = 0) = 0. For c = 1 all bonds are present and the classification is at 
random, so that a c (c = 1) — > as iV — > oo. For intermediate values of c we shall 
calculate a c (c) using Gardner's phase space approach 0. 
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Form the technical point of view the problem is related to the Ising perceptron 
|7], IB] and other discrete models , as well as to the knapsack problem |T0, |llj , where 
also binary dynamical variables appear. Note, that in contrast to the common approach 
where the couplings Jj are the dynamical variables, here they represent in addition to 
the patterns a quenched disorder which has to be averaged out. 

For a fixed concentration c, the number of allowed configurations according to (|2|) 
is given by 

■A/Xc) = E ft ( s "4rf E - K ) *** (e c * - cN ) ■ ( 3 ) 



{a} ju=i \ viV i=1 / \ i=1 / 

We introduced, as usual, the stability parameter k which should be positive. The 
corresponding entropy per bond of the microcanonical ensemble follows from 

S(c) = I ((In A/»» = lim -L]n((Ar(c)>) , (4) 

where the last equality results from the replica trick. The quenched averages ((...)) 
have to be performed over the distributions of the patterns, outputs and the couplings. 
Following the steps of the calculation by Gardner and Derrida (1988) one can rewrite 
the replicated number of configurations using the integral representation of the theta 
function in (^|). The averages over the pattern and output distributions lead to the 
exponential factor 

(n-(4?(?«) 2 )) w - w 

Here are the conjugate variables to the local fields and a denotes the replica 
index running from 1 to n. The average over the couplings has still to be done. A 
straightforward evaluation however, leads to an expression which cannot be rewritten 
in terms of an exponential, as it is convenient, for the argument of the exponent in (|5|) 
is not necessarily infinitesimal due to the sum over all patterns. Instead, we introduce 
at this point the order parameters 

1 N 

<l af3 =MY, J ? c ? c i (a,/? = l,...,n; a < (3) (6) 

i=l 

1 N 

Q a =<T = T7E^ (a=l,...,n) (7) 

iV i=i 

and leave the average over the distribution of the couplings for the integral 
representations of the delta functions fixing q a ^ and Q a . In addition one has a third 
order parameter E a which fixes the concentration c. We seek for a replica symmetric 
solution, i.e. q a/3 = q , Q a = Q and E a = E. Carrying out the sum over the cf and 
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using the saddle-point method we obtain for the entropy in the thermodynamic limit: 
S{c) = ajDt ^(?±0\ + \F t -\fQ + \cE 



+ J dJP(J) J Dt In (l + exp (VF\J\t + \{fJ 2 — FJ 2 — E^j\ , (8) 



where F and / are the conjugate order parameters to q and Q respectively and we have 
used the notations: 



exp (— ^i 2 



Dt = dt K " ' , E(x) = I Dt . (9) 

v27T 
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For the distribution of the couplings P(Ji) we will focus on two cases: \Ji\ = 1 
(note, that the entropy (§) depends only on the absolute value of J,) and J, drawn 
from a normal distribution i.e. dJP(J) = DJ. In the first case it follows that 
Q = c from (|7|) and the definition of c. We solve the saddle point equations 
dS/dq = dS/dF = dS/dE = 0. The critical storage capacity as determined by Gardner 
and Derrida would be reached as q approaches c. This however, leads to a negative 



entropy of the system, as frequently observed in discrete problems [13]. We therefore 



identify the critical storage capacity as the value of a at which the entropy @ vanishes, 
as it has become the standard way by now P, [TTj, TJ]. The resulting curve for the 
critical storage capacity a c is shown in figure |1] as a function of the concentration c. 
As expected a c vanishes at both extremes of c. We observe a maximum of a c ~ 0.59 
at c ~ 0.32. This means, that about 2/3 of the bonds have to be removed in order to 
reach the maximal value of a c . It is somewhat surprising, since as a function of c, the 
maximal number of configurations {q} lies at c = 0.5 and is exponentially larger in N 
than for any other value of the concentration. In section § we will come back to this 
point. 

It is worth to note, that the case |Jj| = 1 can be mapped onto the Ising perceptron 
with couplings or 1, as in (§) one can define the new patterns \i — Ji£i an d view the c, 
as the couplings. The distribution of the Xi has also zero mean and unit variance. The 
critical storage capacity for the (0,1) Ising perceptron has been calculated by Gutfreund 
and Stein || with the zero-entropy (ZE) ansatz and was found to be 0.59 in agreement 
with the value found here at the maximum. 

We performed an analysis of the local stability of the replica symmetric (RS) solution 
according to de Almeida and Thouless |15| and obtained the curve denoted as AT-line 
in figure |I[ For values of a that lie above this curve the RS solution is locally unstable. 
Our result for a c (c) lies below the AT-line for all c and is therefore locally stable. 
Nonetheless, global stability is not assured. For this reason we also performed complete 
enumerations of all possible dilution vectors for finite systems. The dots with their 
corresponding error bars are results for systems with 9 < iV < 24. For c = 1 finite size 



5 



effects lead to a c ~ N" 1 , since the probability of classifying one pattern correctly by 
chance is 1/2. In contrast, for values of c around the maximum, the numerical results 
seem to underestimate the theoretical values. A finite size scaling analysis for c = 1/3 
gives the extrapolated value of a c (l/3) = 0.586 ± 0.004 for iV — > oo in agreement with 
the RS-solution (0.58935) at the same concentration. The general shape of the curve is 
well confirmed by the numerical results. 

In the case where the J, are drawn at random from a normal distribution the picture 
changes quantitatively. Now, for a fixed i, = as before, but ^(</i£f) 2 ^ = 

•^ 2 (Cf 2 ) = Jf which is in general different from unity as in the previous case. As a 
consequence Q is different from c and we have to solve the additional saddle point 
equations dS/dQ = dS/df = 0. The ZE condition yields for n = the critical storage 
capacity depicted also in figure [l|. Similarly, the curve has a maximum at c ~ 0.34, the 
critical capacity however is lowered over a wide range of the concentration with respect 
to the binary case. Our interpretation for this effect is that in the Gaussian case the 
dilution variables are mainly used to remove the large couplings (|Jj| > 1) and only 
few of them remain for learning. Therefore the storage capacity a c is lower than for 
the binary weights. The order parameter Q measures the effective size of the remaining 
components Jj. For all c the RS-solution gives Q < c, supporting the above argument. 
In addition we measured the probability distribution of the size of remaining couplings 
in complete enumerations, and found, that large couplings are likely to be removed. The 
values of a c for finite N are displayed in figure [T] as well. 



3. Learning with a teacher 

A teacher perceptron B classifies a pattern £ M according to: 

s" = sgn(4= B-A (10) 



'N 

We choose a teacher vector which is not diluted and which has the normalization 
B 2 = N. A transition to perfect generalization through dilution cannot be expected like 
in other discrete systems where the structure of teacher and student coincides |23| . The 
student perceptron with components CjJj can only remove part of its weights in order 
to learn perfectly a set of examples given by (|TUD, resulting in a finite storage capacity. 



A straightforward evaluation of the entropy under the RS assumption yields: 

^^/Mvot'M^H^ -\ fQ + \ cE + \ GR + 

JdBP T (B) JdJP(J) Jl>t\n(l + exp (VF\J\t + \(j J 2 - FJ 2 - E - GBJ)X^j ,(11) 

where the overlap R = J2i CiJiBi/ Cs/QN^j between diluted student and teacher, and its 
conjugate G has been introduced. Pt{B) is the distribution of the teacher components. 
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We focus again on the two cases where | Jj| = \Bi\ = 1 or both chosen independently 
at random from a normal distribution. The corresponding critical storage capacity a c 
determined with the ZE condition is shown in figure |2] as a function of the concentration 
of remaining bonds. Once more, we observe a maximum of a c at c ~ 1/3. The critical 
storage capacity is higher than for random outputs indicating that the problem, although 
unlearnable, is easier with examples from a teacher. 

An important point to note is that the generalization ability defined as the 
probability to classify a new unseen pattern correctly (as the teacher) is poor compared 
to the value which could be achieved by intelligent dilution. For c = 0.3, R ~ 0.32 
in the binary case |Jj| = \Bi\ = 1, whereas an overlap of R = \/0.3 ~ 0.55 could be 
possible according to the following argument. The product BJi should be +1 for as 
many sites as possible in order to maximize R. Since Bi and Jj are drawn at random 
they will coincide in N/2 of the cases for iV — ► oo. For c < 0.5 we choose all those q = 1 
for which BiJi = +1 up to a total number of cN, so that R max = cN/(y/cN) = \fc. If 
c is larger than 0.5 then we also have to add up (cN — N/2) times a value of —1 and 
Rmax = (N/2 — cN + N/2)/(y/cN) = (1 — c)/-y/c. Perfect storage without errors lowers 
the overlap R, an effect known as overfitting, which can be overcome by allowing a finite 
training error (see section |4]). 

Up to now, we have assumed, that student J and teacher B are uncorrelated before 
the dilution. In biological systems however, we would rather expect to find synaptical 
structures, which are already prepared for a specific task before the learning process 
starts. In our model we can mimic it by allowing an initial positive overlap Rq, and 
therefore certain similarity, between teacher and student. Let us choose 

P(J) = ±±*° S(J - B) + 5{J + B) (12) 

with < Rq < 1. In (l + R Q )N/2 of the cases the components of teacher and student will 
coincide and in (1 — R )N/2 they will be opposed. Since we are not allowed to change 
the values of J«, but at best to remove the bond, we will not reach perfect generalization 
even for large Rq. The same would even hold if we chose a diluted teacher. 

We have calculated the critical storage capacity for \Bi\ — 1 as a function of Rq 
and c and find the results plotted in figure |3]. For R = we recover the uncorrelated 
case, whereas with increasing Rq the storage capacity is enhanced for all c. At the same 
time the maximum of the curve moves towards higher values of the concentration, as 
less bonds have to be removed in order to mimic the teacher. For Rq = 1 teacher and 
student are identical before dilution. Although this situation is of less relevance from 
the biological point of view, it offers an interesting physical solution. If a fraction of the 
bonds is now removed, we obtain a finite storage capacity. For increasing concentration 
c, we would expect the capacity to increase at the same time. Above c ~ 0.82 however, 
we find, that it decreases and finally tends to zero for c — > 1. At c = 1 we have 
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a c = oo per definition, hence c = 1 is a singular point. This surprising behaviour may 



be understood if we look at the annealed approximation [22] for the entropy 



S ann (c) = i ln«A/-(c)». (13) 

The averaged number ((A/"(c))) of allowed configurations for fixed c is simply given by 
the total number (j^Q times the probability that p patterns are classified correctly: 

W)) = (Vj (l - iarccos/?)" . (14) 

For Rq = 1 we have i? = \fc and using the ZE-condition one obtains in the 
thermodynamic limit for the critical storage capacity in the annealed approximation 

clnc+(l-c) ln(l-c) 

«ann(c) = — — 7" : , (15) 

ln( 1 — ^ arccos y/cj 

which for c — > 1 results in 

Ctannic — > 1) — » lim — AtT\/1 — C hl\/I — C — > 0. (16) 

c— >1 

Since ct ann (c) is an upper bound for ct c (c), the critical storage capacity has to decrease 
to zero as well when c tends to 1. From (|14"D we see, that although the probability of 
classifying one pattern correctly tends to one for c — > 1, at the same time the total 
number of dilution vectors decreases rapidly, such that the averaged number of allowed 
configurations is no longer exponentially large in N. Perfect storage of patterns is 
different from optimal generalization, which in this case becomes better the closer c is 
to 1. In figure |3] we also included results from complete enumerations of systems with 
25 < N < 400 for R = 1 and c close to 1. They confirm that a c decreases in this 
region. 

4. A simple Hebb— like dilution algorithm 

As we have seen in the previous sections, Gardner's method is very powerful when asking 
if there exists, on average, a set of q such that all perceptron conditions (§) are satisfied, 
but it does not provide us with the corresponding dilution vector for a specific set of 
patterns outputs s M and couplings Jj. The development of a learning , or in our case 
dilution algorithm, is an independent task, which in the case of binary variables q = 0, 1 
becomes extremely difficult and compares with the binary perceptron problem or the 
knapsack problem. In the worst case, the number of computational steps towards the 
optimal solution scales exponentially with the size iV of the system. The most successful 
approaches try to find the global minimum of a properly defined energy function, which 
penalizes the violation of constraints, by using sequential descent [O or simulated 
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annealing |Tj| strategies. Although an algorithm based on mean field annealing has 



proven to be very effective in finding solutions to the knapsack problem [10|, none of the 
known techniques yields the critical values for the storage capacity predicted by Gardner 
calculations. Typically, in large systems, the solutions still violate a finite fraction of 
the imposed constraints. 

In view of these general difficulties, we cannot expect to find the optimal dilution 
vector c, which allows us to store perfectly a c N many patterns for large N, within 
reasonable time. We rather present here a simple dilution algorithm, which gives us an 
insight into the basic properties of the solutions and has optimal generalization ability 
for a — > oo. 

Our aim is to fulfill all constraints @ by removing a fraction (1 — c)N of the bonds 
Jj. If we think of the terms as matrix elements of a iV x p matrix, then 

we want all p vertical sums Z)i=i a in (ju = 1, . . . ,p) to be positive. For this purpose, 
we are allowed to remove (1 — c)N rows of the matrix. The idea is to remove those, 
which contain many negative elements a ifl , since these contribute in many vertical sums 
negatively. Let us take away all rows % with horizontal sums X^=i a ifi smaller than a 
threshold h, so that 



E - W " h (17) 



The larger h, the more q will be zero, leading to a lower concentration c. From a different 
perspective one can view ( |P7D as comparing the Hebb couplings H { = X^=i 6f s/i I VN of 



the problem (Hopfield 1982) with Jj, the ones imposed at random. If their product Hi J t 
is larger than the threshold h, then Jj is accepted as coupling strength. As h becomes 
more and more positive, this is only the case if Hi and J; agree in their sign. 

Let us now give a simple derivation of the critical storage capacity for random input- 
output pairs and |Jj| = 1, which results from (|T7|) by allowing a certain percentage 
of errors. For simplicity, suppose that £f = ±1 with equal probability. Then, 
din = ±1 with equal probability and the horizontal and vertical sums have, in the limit 
iV — > oo, a Gaussian distribution with zero mean and variance p and N, respectively. 
According to ([17]) we remove all horizontal sums which are smaller than h. The resulting 
concentration is c = J^^Dz = H(h/y/a) and the new mean of the horizontal sums 
is (HS) = y/pexp (— ^h 2 /aj /^/2-kc 2 . The new vertical sums have still a Gaussian 
distribution, but now with mean (VS) = c(HS)/a and variance cN for iV — > oo. The 
fraction of errors is thus equal to the integral over the Gaussian tail below zero: 

/■oil / 1(^-(VS)) 2 \ TT /(VS)\ , . 

Learning Error = e L = / = - exp — ^ \ T " = H ^=J= 18 



For a fixed error e L = H(A) and fixed concentration c, we obtain for the storage capacity: 

, 2 N 



exp (-(H-^c 

°(^» = 2,c-4* " < 19 ' 

with H (a:) the inverse function of H(x). Figure |] shows the resulting a for A = 1 (e L ~ 
15.9%) as a function of c. The maximum of a is reached at c ~ 0.27, which is not too 
far from 0.32, the concentration at which the maximal a c was obtained according to the 
Gardner calculation with the ZE condition (see figure [l]). Also the shape of the curve 
is similar, for different values of e L (or A), a(c,A) is simply rescaled. At first sight one 
would expect that as h increases, also the mean of the vertical sums (VS) increases, 
leading to a lower learning error. This however is prevented by two effects. First, 
as (HS) increases, (VS) does not so necessarily, since (VS) ~ c(HS) and c is lowered 
dramatically with increasing h. As a result, the maximal storage capacity would be at 
c = 0.5 for fixed e L . The second effect is, that the width of the distribution of vertical 
sums is proportional to */c, lowering the learning error for smaller values of c. If c is too 
small however, the gain is compensated by the exponential factor in (HS), which tends 
to zero. As a consequence, the maximum storage capacity for fixed e L lies at a value of 
c somewhat smaller than 0.5. 

In figure ||] the learning error e L (c, a) is plotted as a function of a for h = 
(-v^ c = 0.5). For small a, e L is small, as typical for the Hebb couplings and tends 
to 0.5 for a — > oo. 

A more interesting quantity is the generalization error e G , defined as the probability 
to classify correctly a new pattern £ °, which does not belong to the training set. For a 
random input-output relation e G = 0.5, since the classification s° of £ is at random. 
In the presence of a teacher B however, we can expect to reach a lower generalization 
error. A straightforward evaluation (see e.g. p0|) yields for | Jj| = 1: 

^■>-'jT* (20) 

e G (c, a) = — arccosi? (21) 
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with 



c=- {E(h.)+n{h+)) Bi (22) 
R^-LjBt (H(M-H(M)>r. (23) 



2^ 



' ^(-hA+expf—hl)) (24) 



2^27^ \ ^ V 2 'J 1 \ 2 + J/ Bt 



where 
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The average is to be performed over the distribution P T (Bi) of the teacher components 
E>i. As before, c is the concentration of remaining bonds and R the normalized overlap 
between diluted student and teacher. The parameter q (not to be mismatched with 
q defined by (|J)) does not seem to have a direct physical meaning, but in a certain 
way it does take into account the randomness which is still inherent for R less than 
1. In the extreme case where we ignore the teacher by setting all Bi = 0, we obtain 
R = , q = exp ^h 2 /aj / \j2na and recover for e L the expression for random outputs 
(Pf). The above result for e L and e G is similar to the one obtained with the Clipped- 
Hebb algorithm for the perceptron This does not surprise, as our prescription (fT?]) 
is also in some sense a way of clipping the bonds. 

For all a and c we find e L < e G and for a — > oo and c fixed, e L — > e G , as it should. The 
most important feature however is, that in this same limit the optimal generalization 
error is reached. For a — > oo and c fixed by choosing h appropriately, we obtain from 
(H) for \Bt\ = 1: 

!\fc for c < ~ 

^ for c > ^ 

which is exactly the maximal overlap that can be achieved by removing (1 — c)N bonds 
(see section §). For a teacher with Gaussian Bi we find in the same limit: 

R=~i= exp (H^)) 2 ) (27) 



V 2 

which is also the optimal overlap for this case, as can be shown easily 



5. Conclusion 



We have shown, that it is possible to store information in a neural network solely 
by dilution of synapses. Using Gardner's phase space approach the critical storage 
capacity of a perceptron with random coupling vector was calculated as a function of 
the concentration of remaining bonds for random input-output relations. We found 
a maximum a c ~ 0.6 of the capacity at c ~ 1/3, i.e. after 2/3 of the bonds have 
been removed. Similar results are obtained if the desired outputs are generated by 
an undiluted teacher perceptron, whose coupling vector is uncorrelated to the initial 
student vector. In this case perfect learning is possible up to a critical capacity a c (c), 
only. If the initial network has some preknowledge, i.e. if there is a nonzero overlap 
between teacher and initial student vector we find, that the maximum of the capacity 
moves towards higher concentrations c. 

The problem of finding the subset of couplings which have to be removed is extremely 
difficult and compares to problems which belong to the NP-complete class. Nevertheless, 
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properties of a Hebb-like learning algorithm, which allows for a finite fraction of errors in 
the training set, were calculated. The algorithm reaches for a — > oo the maximal overlap 
between the diluted random student and the undiluted teacher and thus the lowest 
possible generalization error. As the Hebb-rule, it is a local algorithm that accumulates 
information about the training set and decides at the end of this batch process which of 
the couplings are removed. More desirable would be to find a prescription that on-line 
removes disturbing couplings. In contrast to common on-line learning algorithms [ 12 ], 
where infinitesimal changes of the coupling vector are performed in every time step, here 
we would remove single couplings. Whether this procedure can give satisfactory results, 
similar to those obtained in the batch process, also for unlearnable rules, remains open 
and should be studied in the future. 
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Figure captions 




Figure 1. The critical storage capacity a c as a function of the concentration c for 
k = and random outputs. The solid line represents the zero entropy solution for 
| Ji\ = 1. The dots with error bars are results from complete enumerations of systems 
with sizes 9 < N < 24. The long dashed curve is the corresponding AT-line beyond 
which the RS-solution becomes locally unstable. The dashed curve is the critical 
storage capacity for Ji drawn from a normal distribution. 
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Figure 2. The critical storage capacity a c as a function of the concentration c for 
k = and outputs from an undiluted teacher perceptron. The upper curve is for 
|Jj| = = 1 and the lower for Ji and Bi both Gaussian. The dots are numerical 
results from complete enumerations of systems with sizes up to N = 20. 




Figure 3. The critical storage capacity function of the concentration c for 

k = and outputs from a teacher with = 1 for different values of the initial overlap 
i?o between J and B. R n = 1.0,0.9,0.7,0.3,0.0 (from top to bottom). The dots are 
numerical results from complete enumerations of systems with sizes 25 < N < 400 and 
c close to 1. 
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Figure 4. The storage capacity a as a function of the concentration c for |Jj| = 1, 
k = and A = 1 (eL — 15.9%). The lower curve is for randomly chosen outputs and 
the upper curve for outputs from a teacher with \Bi\ = 1. 




Figure 5. Learning error el (solid lines) and generalization error €q (dashed lines) 
as a function of a for h = c = 0.5) and | Ji\ = 1 for random outputs, a binary 
teacher and a Gaussian teacher. 



